Skip to content

Conversation

@ulysses4ever
Copy link
Collaborator

Depends-on: #18

If manually remove toLinear, these are numbers I'm seeing on 8 physical CPUs:

❯ cabal run benchrunner -- 10 "Mergesort" Seq 1000000 2>/dev/null | grep SELFTIMED
SELFTIMED: 0.206165608

❯ cabal run benchrunner -- 10 "Mergesort" Par 1000000 2>/dev/null | grep SELFTIMED
SELFTIMED: 0.256841832

❯ cabal run benchrunner -- 10 "Mergesort" Par 1000000 +RTS -N8 2>/dev/null | grep SELFTIMED
SELFTIMED: 0.189847072

@ulysses4ever
Copy link
Collaborator Author

It looks like toLinear doesn't make much difference: single-digit percents. And it hurts the sequential version a little more than the parallel.

@ulysses4ever ulysses4ever force-pushed the performance-tuning-par branch 2 times, most recently from 4ada11e to 0a8b99b Compare April 1, 2025 19:37
@ulysses4ever
Copy link
Collaborator Author

I rebased on main but performance broke, sadly:

❯ cabal run benchrunner -- 10 "Mergesort" Seq 1000000 2>/dev/null | grep SELFTIMED
SELFTIMED: 0.213269371

❯ cabal run benchrunner -- 10 "Mergesort" Par 1000000 2>/dev/null | grep SELFTIMED
SELFTIMED: 1.150078171

(compare to the numbers in OP).

I'll have to get back to it...

@ulysses4ever
Copy link
Collaborator Author

The reason is I had a silly rebase mistake. I'll fix it today.

@ulysses4ever ulysses4ever force-pushed the performance-tuning-par branch from 0a8b99b to ea30799 Compare April 2, 2025 02:09
@ulysses4ever
Copy link
Collaborator Author

I think this is good to go. Can someone take a look?

@ulysses4ever ulysses4ever marked this pull request as ready for review April 2, 2025 02:14
@ulysses4ever ulysses4ever force-pushed the performance-tuning-par branch from ea30799 to ed50bc6 Compare April 2, 2025 02:37
@ulysses4ever
Copy link
Collaborator Author

I'm expediting this in the interest of unlocking progress on other PRs. This is a performance-only patch. In my fast back-of-envelop evaluation, it brings Mergesort par with one thread on par with Mergesort Seq and shows reasonable scaling with increased number of CPUs. The main content is INLIN(ABL)E pragmas and manual worker/wrapper in a couple of places.

@ulysses4ever ulysses4ever merged commit c66e51d into main Apr 2, 2025
5 checks passed
@ulysses4ever ulysses4ever deleted the performance-tuning-par branch April 2, 2025 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants